Chaotic mixed excitation source for speech synthesis
نویسندگان
چکیده
Linear Prediction (LP) analysis has proven to be very powerful and widely used method in speech analysis and synthesis. Synthesis by LP-based approach is carried by exciting an allpole model (whose parameters are derived by LP analysis). Synthesis is carried by using mixed excitation source consisting of a sequence of impulses for voiced regions and white-noise source for unvoiced regions. In this paper, we present novel chaotic excitation source using chaotic titration method. The voiced and unvoiced regions in speech are modeled by chaos which is quantified by adding noise of known standard deviation (determined using chaotic titration method). It is observed that on an average for synthesized voices (both male and female), MOS increases from 2 to 2.4, DMOS from 2.1 to 2.4 and preference is increased from 39 % to 61 % via A/B test. PESQ score increases from 1 to 1.5 and MCD score decreases from 4.06 to 4.03, relatively for voices synthesized by proposed chaotic mixed excitation source. The relatively better performance of proposed approach is may be due to the novel chaotic mixed source of excitation.
منابع مشابه
A mixed-excitation frequency domain model for time-scale pitch-scale modification of speech
This paper presents a time-scale pitch-scale modification technique for concatenative speech synthesis. The method is based on a frequency domain source-filter model, where the source is modeled as a mixed excitation. This model is highly coupled with a compression scheme that result in compact acoustic inventories. When compared to the approach in the Whistler system using no mixed excitation,...
متن کاملTowards an improved modeling of the glottal source in statistical parametric speech synthesis
This paper proposes the use of the Liljencrants-Fant model (LFmodel) to represent the glottal source signal in HMM-based speech synthesis systems. These systems generally use a pulse train to model the periodicity of the excitation signal of voiced speech. However, this model produces a strong and uniform harmonic structure throughout the spectrum of the excitation which makes the synthetic spe...
متن کاملNICT Blizzard Challenge 2010 Entry
This paper details a speech synthesis system developed at NICT for the Blizzard Challenge 2010. The system depends on an HMM-based speech synthesis technique that possesses two distinctive features: HMM training under global-variance constraint on the parameter trajectory and trainable mixed excitation for source-filter vocoding. For this year’s entry, we added some modifications to the system ...
متن کاملSpeech enhancement using voice source models
Autoregressive (AR) models have been shown to be effective models of speech signal. However, although it is the most common mode1 of speech, an AR process excited by white noise for speech enhancement, fails to capture the effects of source excitation, especidy the quasi periodic nature of voiced speech. Speech synthesis researchers have long recognized this ~roblern and have developed a variet...
متن کاملAperiodicity extraction and control using mixed mode excitation and group delay manipulation for a high quality speech analysis, modification and synthesis system STRAIGHT
A new control paradigm of source signals for high quality speech synthesis is introduced to handle a variety of speech quality, based on timefrequency analyses by the use of an instantaneous frequency and group delay. The proposed signal representation consists of a frequency domain aperiodicity measure and a time domain energy concentration measure to represent source attributes, which supplem...
متن کامل